Human Mutation — Latest Matching Preprints

1

Leveraging cancer mutation data to predict the pathogenicity of germline missense variants

Haque, B.; Cheerie, D.; Pan, A.; Curtis, M.; Nalpathamkalam, T.; Nguyen, J.; Salhab, C.; Thiruvahindrapura, B.; Zhang, J.; Couse, M.; Hartley, T.; Morrow, M. M.; Price, E. M.; Walker, S.; Malkin, D.; Roth, F. P.; Costain, G.

2024-03-13 genetic and genomic medicine 10.1101/2024.03.11.24304106 medRxiv

Top 0.1%

37.5%

Show abstract

Innovative and easy-to-implement strategies are needed to improve the pathogenicity assessment of rare germline missense variants. Somatic cancer driver mutations identified through large-scale tumor sequencing studies often impact genes that are also associated with rare Mendelian disorders. The use of cancer mutation data to aid in the interpretation of germline missense variants, regardless of whether the gene is associated with a hereditary cancer predisposition syndrome or a non-cancer-related developmental disorder, has not been systematically assessed. We extracted putative cancer driver missense mutations from the Cancer Hotspots database and annotated them as germline variants, including presence/absence and classification in ClinVar. We trained two supervised learning models (logistic regression and random forest) to predict variant classifications of germline missense variants in ClinVar using Cancer Hotspot data (training dataset). The performance of each model was evaluated with an independent test dataset generated in part from searching public and private genome-wide sequencing datasets from [~]1.5 million individuals. Of the 2,447 cancer mutations, 691 corresponding germline variants had been previously classified in ClinVar: 426 (61.6%) as likely pathogenic/pathogenic, 261 (37.8%) as uncertain significance, and 4 (0.6%) as likely benign/benign. The odds ratio for a likely pathogenic/pathogenic classification in ClinVar was 28.3 (95% confidence interval: 24.2-33.1, p < 0.001), compared with all other germline missense variants in the same 216 genes. Both supervised learning models showed high correlation with pathogenicity assessments in the training dataset. There was high area under precision-recall curve values of 0.847 and 0.829 for logistic regression and random forest models, respectively, when applied to the test dataset. With the use of cancer and germline datasets and supervised learning techniques, our study shows that cancer mutation data can be leveraged to improve the interpretation of germline missense variation potentially causing rare Mendelian disorders. AUTHOR SUMMARYOur study introduces an approach to improve the interpretation of rare genetic variation, specifically missense variants that can alter proteins and cause disease. We found that published evidence from somatic cancer sequencing studies may be relevant to understanding the impact of the same variant in the context of rare inherited (Mendelian) disorders. By using widely available datasets, we noted that many cancer driver mutations have also been observed as rare germline variants associated with inherited disorders. This intersection led us to employ machine learning techniques to assess how cancer mutation data can predict the pathogenicity of germline variants. We trained machine learning models and tested them on a separate dataset curated by searching public and private genome-wide sequencing data from over a million participants. Our models were able to successfully identify pathogenic genetic changes, demonstrating strong performance in predicting disease-causing variants. This study highlights that cancer mutation data can enhance the interpretation of rare missense variants, aiding in the diagnosis and understanding of rare diseases. Integrating this approach into current genetic classification frameworks could be beneficial, and opens new avenues for leveraging existing cancer research to benefit broader genetic research and diagnostics for rare genetic conditions.

2

The performance of in silico prediction tools for variant curation in a panel of cancer genes.

Nelson, N.; Niaz, A.; Fairfax, K.; Bryan, T. M.; Lucas, S.; Dickinson, J.

2025-07-30 genetic and genomic medicine 10.1101/2025.07.29.25331316 medRxiv

Top 0.1%

33.4%

Show abstract

Rare single base pair changes in genes are an important cause of disease, as they can reside in key regions of the gene influencing biological function by impacting the protein conformation and protein interactions. Generation of the necessary experimental evidence to define the outcome of the presence of these gene variants is time consuming and costly. These challenges have led to the development of a plethora of in silico prediction tools. These tools frequently use similar sources of information and are trained on overlapping multi-gene truth datasets. However, frequently there has been no quantitative validation of the performance of these in silico tools for individual genes. Here we have applied the ClinGen Sequence Variant Interpretation Working Groups recommended in silico score thresholds to a set of predisposition gene variants with established pathogenicity/benignity. Of the genes assessed (BRCA1, BRCA2, TP53, TERT and ATM), in silico tool predictions showed inferior sensitivity (<65%) for pathogenic TERT variants and inferior sensitivity ([≤]81%) for benign TP53 variants. This validation study highlights in silico tool performance can be gene-specific and is dependent on the training set on which the algorithm is built. Where there are sufficient numbers of established benign and pathogenic missense variants based on clinical and functional evidence, the use of in silico tool scores should be validated for individual genes. For genes where this is not possible and gene-agnostic in silico score cut offs are used, consideration of missense variant-protein structural impact relationships is suggested.

3

Systematic large-scale application of ClinGen InSiGHT APC-specific ACMG/AMP variant classification criteria substantially alleviates the burden of variants of uncertain significance in ClinVar and LOVD databases

Yin, X.; Richardson, M. E.; Laner, A.; Shi, X.; Ognedal, E.; Vasta, V.; Hansen, T. v. O.; Pienda, M.; Ritter, D.; den Dunnen, J. T.; Hassanin, E.; Lyman Lin, W.; Borras, E.; Krahn, K.; Nordling, M.; Martins, A.; Mahmood, K.; Nadeau, E. A. W.; Beshay, V.; Tops, C.; Genuardi, M.; Pesaran, T.; Frayling, I. M.; Capella, G.; Latchford, A.; Tavtigian, S. V.; Maj, C.; Plon, S. E.; Greenblatt, M. S.; Macrae, F. A.; Spier, I.; Aretz, S.

2024-05-04 genetic and genomic medicine 10.1101/2024.05.03.24306761 medRxiv

Top 0.1%

29.5%

Show abstract

BackgroundPathogenic constitutional APC variants underlie familial adenomatous polyposis, the most common hereditary gastrointestinal polyposis syndrome. To improve variant classification and resolve the interpretative challenges of variants of uncertain significance (VUS), APC-specific ACMG/AMP variant classification criteria were developed by the ClinGen-InSiGHT Hereditary Colorectal Cancer/Polyposis Variant Curation Expert Panel (VCEP). MethodsA streamlined algorithm using the APC-specific criteria was developed and applied to assess all APC variants in ClinVar and the InSiGHT international reference APC LOVD variant database. ResultsA total of 10,228 unique APC variants were analysed. Among the ClinVar and LOVD variants with an initial classification of (Likely) Benign or (Likely) Pathogenic, 94% and 96% remained in their original categories, respectively. In contrast, 41% ClinVar and 61% LOVD VUS were reclassified into clinically actionable classes, the vast majority as (Likely) Benign. The total number of VUS was reduced by 37%. In 21 out of 36 (58%) promising APC variants that remained VUS despite evidence for pathogenicity, a data mining-driven work-up allowed their reclassification as (Likely) Pathogenic. ConclusionsThe application of APC-specific criteria substantially reduced the number of VUS in ClinVar and LOVD. The study also demonstrated the feasibility of a systematic approach to variant classification in large datasets, which might serve as a generalisable model for other gene-/disease-specific variant interpretation initiatives. It also allowed for the prioritization of VUS that will benefit from in-depth evidence collection. This subset of APC variants was approved by the VCEP and made publicly available through ClinVar and LOVD for widespread clinical use.

4

Mismatch repair gene specifications to the ACMG/AMP classification criteria: Consensus recommendations from the InSiGHT ClinGen Hereditary Colorectal Cancer / Polyposis Variant Curation Expert Panel

Plazzer, J. P.; Macrae, F.; Yin, X.; Thompson, B. A.; Farrington, S. M.; Currie, L.; Lagerstedt-Robinson, K.; Frederiksen, J. H.; van Overeem Hansen, T.; Graversen, L.; Frayling, I. M.; Akagi, K.; Yamamoto, G.; Al-Mulla, F.; Ferber, M. J.; Martins, A.; Genuardi, M.; Kohonen-Corish, M.; Baert-Desurmont, S.; Spurdle, A. B.; Capella, G.; Pineda, M.; Woods, M. O.; Rasmussen, L. J.; Heinen, C. D.; Scott, R. J.; Tops, C. M.; Greenblatt, M. S.; Dominguez-Valentin, M.; Ognedal, E.; Borras, E.; Leung, S. Y.; Mahmood, K.; Holinski-Feder, E.; Laner, A.

2024-05-14 genetic and genomic medicine 10.1101/2024.05.13.24307108 medRxiv

Top 0.1%

27.0%

Show abstract

BackgroundIt is known that gene- and disease-specific evidence domains can potentially improve the capability of the ACMG/AMP classification criteria to categorize pathogenicity for variants. We aimed to include gene-disease-specific clinical, predictive, and functional domain specifications to the ACMG/AMP criteria with respect to MMR genes. MethodsStarting with the original criteria (InSiGHT criteria) developed by the InSiGHT Variant Interpretation Committee, we systematically addressed specifications to the ACMG/AMP criteria to enable more comprehensive pathogenicity assessment within the ClinGen VCEP framework, resulting in an MMR gene-specific ACMG/AMP criteria. ResultsA total of 19 criteria were specified, 9 were considered not applicable and there were 35 variations of strength of the evidence. A pilot set of 48 variants was tested using the new MMR gene-specific ACMG/AMP criteria. Most variants remained unaltered, as compared to the previous InSiGHT criteria; however, an additional four variants of uncertain significance were reclassified to P/LP or LB by the MMR gene-specific ACMG/AMP criteria framework. ConclusionThe MMR gene-specific ACMG/AMP criteria have proven feasible for implementation, are consistent with the original InSiGHT criteria, and enable additional combinations of evidence for variant classification. This study provides a strong foundation for implementing gene-disease-specific knowledge and experience, and could also hold immense potential in a clinical setting.

5

Application Of The Acmg/Amp Framework To Capture Evidence Relevant To Predicted And Observed Impact On Splicing: Recommendations From The Clingen Svi Splicing Subgroup

Walker, L. C.; de la Hoya, M.; Wiggins, G. A.; Lindy, A.; Vincent, L. M.; Parsons, M.; Canson, D. M.; Bis-Brewer, D.; Cass, A.; Tchourbanov, A.; Zimmermann, H.; Byrne, A. B.; Pesaran, T.; Karam, R.; Harrison, S. M.; ClinGen Sequence Variant Interpretation Working Group, ; Spurdle, A. B.

2023-02-26 genetic and genomic medicine 10.1101/2023.02.24.23286431 medRxiv

Top 0.1%

26.6%

Show abstract

The American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) framework for classifying variants uses six evidence categories related to the splicing potential of variants: PVS1 (null variant in a gene where loss-of-function is the mechanism of disease), PS3 (functional assays show damaging effect on splicing), PP3 (computational evidence supports a splicing effect), BS3 (functional assays show no damaging effect on splicing), BP4 (computational evidence suggests no splicing impact), and BP7 (silent change with no predicted impact on splicing). However, the lack of guidance on how to apply such codes has contributed to variation in the specifications developed by different Clinical Genome Resource (ClinGen) Variant Curation Expert Panels. The ClinGen Sequence Variant Interpretation (SVI) Splicing Subgroup was established to refine recommendations for applying ACMG/AMP codes relating to splicing data and computational predictions. Our study utilised empirically derived splicing evidence to: 1) determine the evidence weighting of splicing-related data and appropriate criteria code selection for general use, 2) outline a process for integrating splicing-related considerations when developing a gene-specific PVS1 decision tree, and 3) exemplify methodology to calibrate bioinformatic splice prediction tools. We propose repurposing of the PVS1_Strength code to capture splicing assay data that provide experimental evidence for variants resulting in RNA transcript(s) with loss of function. Conversely BP7 may be used to capture RNA results demonstrating no impact on splicing for both intronic and synonymous variants, and for missense variants if protein functional impact has been excluded. Furthermore, we propose that the PS3 and BS3 codes are applied only for well-established assays that measure functional impact that is not directly captured by RNA splicing assays. We recommend the application of PS1 based on similarity of predicted RNA splicing effects for a variant under assessment in comparison to a known Pathogenic variant. The recommendations and approaches for consideration and evaluation of RNA assay evidence described aim to help standardise variant pathogenicity classification processes and result in greater consistency when interpreting splicing-based evidence.

6

A structure-aware framework for genomic variant interpretation in genetic skeletal disorders

Piticchio, S. G.; Hosseini, N.; Grigelioniene, G.; Orellana, L.

2026-03-17 genomics 10.64898/2026.03.15.711892 medRxiv

Top 0.1%

24.1%

Show abstract

BackgroundGenetic skeletal disorders (GSDs) comprise a heterogeneous group of rare, predominantly monogenic conditions that are increasingly diagnosed through high-throughput sequencing. While gene discovery has progressed rapidly, interpretation of pathogenic and uncertain variants remains a major bottleneck, in part because their functional consequences are determined at the protein structure level. However, a systematic assessment of structural knowledge across GSD-associated genes is currently lacking. Here, we present a comprehensive protein structure-centric analysis of 674 protein-coding genes implicated in GSDs. MethodsWe integrated experimental structures, AlphaFold2 (AF2) models, multimeric states, protein-protein interactions, and ClinVar variant annotations. ResultsWe quantify experimental structural availability and sequence coverage, revealing that 37% of GSD proteins lack any experimental structure and that, among proteins with structures, sequence coverage is often incomplete. We show that AF2 models provide high-confidence structural information for a substantial subset of proteins lacking experimental data, but that model reliability strongly correlates with existing structural coverage. Analysis of multimeric assemblies and co-occurring partners demonstrates that many GSD proteins function as obligate multimers, highlighting the importance of interface-level interpretation of variants. Finally, mapping clinically annotated missense variants onto representative protein structures illustrates how structural context can inform the interpretation of pathogenic and uncertain variants, particularly at interaction interfaces. ConclusionsTogether, this work provides a structure-aware reference framework for GSD genes, highlighting systematic gaps in current protein knowledge and demonstrating how integration of structural data can guide genomic variant interpretation. Our observations support a broader principle of structural equivalence, whereby distinct variants converge on shared structural perturbations that explain clustering patterns and enable mechanistic interpretation of nearby variants of uncertain significance.

7

From Past to Present: Pompe Disease, Pseudodeficiency Alleles, and Diagnostic Challenges

Giliberto, F.; Buonfiglio, P. I.; Capellino, G.; Massini, C. L.; Dalamon, V.; Luce, L.; Carcione, M.

2024-10-04 genetic and genomic medicine 10.1101/2024.10.03.24314698 medRxiv

Top 0.1%

23.8%

Show abstract

Pompe disease is an autosomal recessive disorder caused by GAA variants leading to acid alpha-glucosidase deficiency. Diagnosis is challenging due to the variable phenotypic presentation and overlap with other conditions. Traditionally, diagnosis relies on measuring enzyme activity, but next-generation sequencing (NGS) advancements have improved accuracy. However, interpreting variants is complex, especially because pseudodeficiency alleles mimic disease-causing variants. We present two patients harboring the pseudodeficiency allele NM_000152.5(GAA):c.271G>A, p.Asp91Asn, which is confusing due to inaccurate reports and results related to enzymatic activity. The first case was a recently published controversial case of a 700-year-old mummy in which the authors classified the variant as pathogenic. The second patient had symptoms compatible with late-onset Pompe disease and was homozygous for the variant. We aimed to determine the correct variant classification using GAA:c.271G>A as a model and to achieve a genetic diagnosis of the second patient. This variant was analyzed following international guidelines (ACMG-AMP) and reviewed with the Lysosomal Diseases Variant Curation Expert Panel. The second patient underwent NGS. We demonstrated that GAA:c.271G>A meets the criterion of being classified as benign for Pompe. Additionally, the second patient carried a heterozygous pathogenic PABPN1 variant associated with oculopharyngeal muscular dystrophy, which better explained the clinical features. This underscores the importance of expanding the genetic analysis in the presence of pseudodeficiency alleles that can mask the true cause of the disease and highlights the fact that an accurate diagnosis should adhere to guidelines on variant curation to reduce the risk of misdiagnosis, which could result in inadequate care and risky medical decisions. Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=84 SRC="FIGDIR/small/24314698v1_ufig1.gif" ALT="Figure 1"> View larger version (28K): org.highwire.dtl.DTLVardef@12a720borg.highwire.dtl.DTLVardef@1a53d15org.highwire.dtl.DTLVardef@14d4729org.highwire.dtl.DTLVardef@7b0186_HPS_FORMAT_FIGEXP M_FIG C_FIG

8

Revision of splicing variants in the DMD gene

Davydenko, K.; Skoblov, M.; Filatova, A.

2024-02-02 genetics 10.1101/2024.01.31.578175 medRxiv

Top 0.1%

23.1%

Show abstract

BackgroundPathogenic variants in the dystrophin (DMD) gene lead to X-linked recessive Duchenne muscular dystrophy (DMD) and Becker muscular dystrophy (BMD). Nucleotide variants that affect splicing are a known cause of hereditary diseases. However, their representation in the public genomic variation databases is limited due to the low accuracy of their interpretation, especially if they are located within exons. The analysis of splicing variants in the DMD gene is essential both for understanding the underlying molecular mechanisms of the dystrophinopathies pathogenesis and selecting suitable therapies for patients. ResultsUsing deep in silico mutagenesis of the entire DMD gene sequence and subsequent SpliceAI splicing predictions, we identified 7,948 DMD single nucleotide variants that could potentially affect splicing, 863 of them were located in exons. Next, we analyzed over 1,300 disease-associated DMD SNVs previously reported in the literature (373 exonic and 956 intronic) and intersected them with SpliceAI predictions. We predicted that [~]95% of the intronic and [~]10% of the exonic reported variants could actually affect splicing. Interestingly, the majority (75%) of patient-derived intronic variants were located in the AG-GT terminal dinucleotides of the introns, while these positions accounted for only 13% of all intronic variants predicted in silico. Of the 97 potentially spliceogenic exonic variants previously reported in patients with dystrophinopathy, we selected 38 for experimental validation. For this, we developed and tested a minigene expression system encompassing 27 DMD exons. The results showed that 35 (19 missense, 9 synonymous, and 7 nonsense) of the 38 DMD exonic variants tested actually disrupted splicing. We compared the observed consequences of splicing changes between variants leading to severe Duchenne and milder Becker muscular dystrophy and showed a significant difference in their distribution. This finding provides extended insights into relations between molecular consequences of splicing variants and the clinical features. ConclusionsOur comprehensive bioinformatics analysis, combined with experimental validation, improves the interpretation of splicing variants in the DMD gene. The new insights into the molecular mechanisms of pathogenicity of exonic single nucleotide variants contribute to a better understanding of the clinical features observed in patients with Duchenne and Becker muscular dystrophy.

9

A high throughput splicing assay to investigate the effect of variants of unknown significance on exon inclusion

Scott, H. A.; Place, E.; Harper, E.; Mehrotra, S.; Center for Mendelian Genomics, Broad Institute, ; Huckfeldt, R.; Comander, J.; Pierce, E. A.; Bujakowska, K. M.

2022-12-06 genetic and genomic medicine 10.1101/2022.11.30.22282952 medRxiv

Top 0.1%

23.0%

Show abstract

PurposeInconclusive interpretation of pathogenicity of variants is a common problem in Mendelian disease diagnostics. We hypothesized that some variants of unknown significance (VUS) may lead to aberrant pre-mRNA splicing. To address this we have developed a high throughput splicing assay (HTSA) than can be utilized to test the effects of 1000s of variants on exon recognition. Methods2296 reference, control and variant sequences from 380 exons of 89 genes associated with inherited retinal degenerations (IRDs) were cloned as a pool into a split-GFP HTSA construct and expressed in landing pad RCA7 HEK293T cells. Exon inclusion led to disruption of GFP and exon skipping led to GFP reconstitution, enabling to separate GFP+ve and GFP-ve cells by fluorescence activated cell sorting. After deep sequencing-based quantification of studied sequences in each cell pool, exon inclusion index (EII) was determined, where EII = GFP-ve oligo count/total oligo count. ResultsHTSA showed high reproducibility when compared between different biological replicates (tetrachoric correlation coefficient r2 = 0.83). Reference exon sequences showed a high level of exon recognition (median EII = 0.88) which was significantly reduced by mutations to the essential splice sites (donor site variants: median EII=0.06; acceptor site variants: median EII=0.48). Of the 748 studied VUSs, 47 variants led to decreased exon inclusion ({Delta}EII [≤] -0.3) with 11 variants showing a strong effect ({Delta}EII [≤] -0,6). Using the HTSA data we were able to provide a likely genetic diagnosis to five IRD cases. ConclusionHTSA offers a robust method to study the effects of VUSs on exon recognition allowing to provide new diagnoses for patients with Mendelian disorders.

10

An optimized variant prioritization process for rare disease diagnostics: recommendations for Exomiser and Genomiser

Cooperstein, I. B.; Marwaha, S.; Ward, A.; Kobren, S. N.; Carter, J. N.; Undiagnosed Diseases Network, ; Wheeler, M. T.; Marth, G. T.

2025-04-20 genetic and genomic medicine 10.1101/2025.04.18.25326061 medRxiv

Top 0.1%

22.9%

Show abstract

PurposeWhole-exome sequencing (WES) and whole-genome sequencing (WGS) are increasingly used as standard genetic tests to identify the diagnostic variants in rare disease cases. However, prioritizing these variants to reduce the time and burden of manual interpretation by clinical teams remains a significant challenge. The Exomiser/Genomiser software suite is the most widely adopted open-source software for prioritizing coding and non-coding variants. Despite its ubiquitous use, limited data-driven guidelines currently exist to optimize its performance for diagnostic variant prioritization. Based on detailed analyses of Undiagnosed Diseases Network (UDN) probands, this study presents optimized parameters and practical recommendations for deploying the Exomiser and Genomiser tools. We also highlight scenarios where diagnostic variants may be missed and propose alternative workflows to improve diagnostic success in such complex cases. MethodsWe analyzed 386 diagnosed probands from the UDN, including cases with coding and non-coding diagnostic variants. We systematically evaluated how tool performance was affected by key parameters, including gene:phenotype association data, variant pathogenicity predictors, phenotype term quality and quantity, and the inclusion and accuracy of family variant data. ResultsParameter optimization significantly improved Exomisers performance over default parameters. For WGS data, the percentage of coding diagnostic variants ranked within the top ten candidates increased from 49.7% to 85.5%, and for WES, from 67.3% to 88.2%. For non-coding variants prioritized with Genomiser, the top ten rankings improved from 15.0% to 40.0%. We also explored refinement strategies for Exomiser outputs, including using p-value thresholds and flagging genes that are frequently ranked in the top 30 candidates but rarely associated with diagnoses. ConclusionThis study provides an evidence-based framework for variant prioritization in WES and WGS data using Exomiser and Genomiser. These recommendations have been implemented in the Mosaic platform to support the ongoing analysis of undiagnosed UDN participants and provide efficient, scalable reanalysis to improve diagnostic yield. Our work also highlights the importance of tracking solved cases and diagnostic variants that can be used to benchmark bioinformatics tools.

11

BayesQuantify: an R package utilized to refine the ACMG/AMP criteria according to the Bayesian framework

Liu, S.; Feng, X.; Bu, F.

2024-09-10 genetic and genomic medicine 10.1101/2024.09.08.24313284 medRxiv

Top 0.1%

22.8%

Show abstract

BackgroundImproving the precision and accuracy of variant classification in clinical genetic testing requires further specification and stratification of the American College of Medical Genetics and Genomics/Association for Molecular Pathology (ACMG/AMP) criteria. While the Clinical Genome Resource (ClinGen) Bayesian framework enables quantitative evidence calibration for selected criteria, standardized tools to optimize evidence thresholds and systematically refine ACMG/AMP criteria remain underdeveloped. MethodsTo address this gap, we developed BayesQuantify, an R package that provides a unified resource for quantifying evidence strength for ACMG/AMP criteria based on the Bayesian framework. BayesQuantify accepts a variant classification file as input and automatically calculates the odds of pathogenicity for each evidence strength, incorporating user-provided prior probabilities of pathogenicity. Through bootstrapping, BayesQuantify generates thresholds by aligning the 95% lower boundary of positive likelihood ratio/local positive likelihood ratio values with the odds of pathogenicity for different evidence levels. Three independent datasets (the ClinVar 2019 dataset, the ClinGen curated dataset, and the PTEN gene dataset) derived from ClinVar, HGMD, and gnomAD were utilized to evaluate the utility of BayesQuantify. ResultsValidation across three independent datasets demonstrates that BayesQuantify delivers objective, consistent refinements for both categorical and continuous ACMG/AMP evidence. Specifically, we replicated the PP3/BP4 thresholds for four computational tools (BayesDel, VEST4, REVEL, and MutPred2) recommended by the ClinGen Sequence Variant Interpretation Working Group using the ClinVar 2019 dataset. Our analysis also indicated that the PM2 criterion should be downgraded from moderate to supporting evidence, aligning with ClinGen recommendations. Importantly, we have established thresholds for supporting, moderate, and strong evidence for in-silico tools using this tool, thereby expanding the application of PP3/BP4 criteria for missense variants in the PTEN gene. ConclusionsBayesQuantify is an accessible and user-friendly resource that enhances the rigor and reproducibility of ACMG/AMP criteria application. By facilitating evidence-based stratification and threshold optimization, the tool strengthens variant classification workflows, offering immediate value to clinical genetic testing laboratories and research communities. The package is freely available at https://github.com/liusihan/BayesQuantify.

12

Phenotype correlations with pathogenic DNA variants in the MUTYH gene

Thet, M.; Plazzer, J. P.; Capella, G.; Latchford, A.; Nadeau, E. A. W.; Greenblatt, M. S.; Macrae, F.

2024-05-15 genetic and genomic medicine 10.1101/2024.05.15.24307143 medRxiv

Top 0.1%

22.8%

Show abstract

MUTYH-associated polyposis (MAP) is an autosomal recessive disorder where the inheritance of constitutional biallelic pathogenic MUTYH variants predisposes a person to the development of adenomas and colorectal cancer (CRC). It is also associated with extracolonic and extraintestinal manifestations that may overlap with the phenotype of familial adenomatous polyposis (FAP). Currently, there are discrepancies in the literature regarding whether certain phenotypes are truly associated with MAP. This narrative review aims to explore the phenotypic spectrum of MAP to better characterise the MAP phenotype. A literature search was conducted to identify articles reporting on MAP-specific phenotypes. Clinical data from 2109 MAP patients identified from the literature showed that 1123 patients (53.2%) had CRC. Some patients with CRC had no associated adenomas, suggesting that adenomas are not an obligatory component of MAP. Carriers of the two missense founder variants, and possibly truncating variants, had an increased cancer risk when compared to those who carry other pathogenic variants. It has been suggested that somatic G:C>T:A transversions are a mutational signature of MAP, and could be used as a biomarker in screening and identifying patients with atypical MAP, or in associating certain phenotypes with MAP. The extracolonic and extraintestinal manifestations that have been associated with MAP include duodenal adenomas, duodenal cancer, fundic gland polyps, gastric cancer, ovarian cancer, bladder cancer and skin cancer. The association of breast cancer and endometrial cancer with MAP remains disputed. Desmoids and Congenital Hypertrophy of the Retinal Pigment Epithelium (CHRPEs) are rarely reported in MAP, but have long been seen in FAP patients, and thus could act as a distinguishing feature between the two. This collection of MAP phenotypes will assist in the assessment of pathogenic MUTYH variants using the American College of Medical Genetics and the Association for Molecular Pathology (ACMG/AMP) Variant Interpretation Guidelines, and ultimately improve patient care.

13

A Compendium of manually annotated genetic variants for Alkaptonuria-AKUHub

S, A.; T.C, A. K.; S, S.; S, S.; N, S.; R, V.; Scaria, V.; Mehta, R. B.

2023-02-23 genetic and genomic medicine 10.1101/2023.02.21.23286262 medRxiv

Top 0.1%

22.6%

Show abstract

1.Alkaptonuria or black urine disease is a rare autosomal recessive disorder caused by dysfunctional homogentisate 1,2-dioxygenase (HGD) gene (3q13.33) leading to accumulation of homogentisic acid in the body. This inborn error in metabolism of phenylalanine and tyrosine causes accumulation of homogentisic acid leading to ochronosis, pigmentation in the sclera, ear cartilage, mitral valve calcification and osteoarthropathy. Advances in sequencing technologies have helped us to map genetic variants associated with alkaptonuria in diverse populations and regions. Currently, no centralized resource of all the reported actionable variants with uniformity in annotation exists for the HGD gene. We have compiled HGD exonic variants from various data sources and systematically annotated their pathogenicity according to American College of Medical Genetics and the Association of Molecular Pathologists (ACMG/AMP) variant classification framework. A total of 1686 exonic variants were catalogued and manually curated, creating one of the most comprehensive Alkaptonuria variant databases (AKUHub) which is publicly available.

14

High-resolution mapping of DMD duplications using long-read sequencing enables precise carrier screening for Duchenne muscular dystrophy

Yang, J.; Dong, Y.; Wang, Z.; Sun, X.; Song, N.; Gu, S.; Zhang, X.; Guo, Y.; Sun, X.; Chen, S.; Wang, J.; Xiang, J.

2025-08-14 genetic and genomic medicine 10.1101/2025.08.11.25333458 medRxiv

Top 0.1%

22.5%

Show abstract

PurposeExon-level duplications in the DMD gene present interpretive challenges due to limitations in resolving their genomic context. We aimed to assess the utility of long-read genome sequencing (lrGS) in characterizing DMD duplications and guiding clinical interpretation. MethodsWe applied low coverage lrGS (3-10x depth; [~]8.2 kb mean read length) to 18 individuals with DMD duplications identified via short-read sequencing. Structural variant calling and breakpoint localization were validated by Sanger sequencing. In addition, the genomic characteristics of the duplication breakpoints were systematically analyzed. ResultslrGS resolved duplication architecture in all cases. Two duplications (11%, 2/18) were extragenic and reclassified as benign; 16 (89%, 16/18) were tandem events within DMD. Among tandem duplications, 50% (8/16) were classified as pathogenic/likely pathogenic and 50% (8/16) as variants of uncertain significance. Breakpoints were consistently located in intronic regions, often flanked by repetitive elements. ConclusionLow-coverage lrGS enables high-resolution mapping of DMD duplications and improves variant classification. This approach addresses a key gap in carrier screening and molecular diagnosis of dystrophinopathies, and provides lrGS as a prototype for decoding duplication architecture of monogenic disorders, which is a critical advance in genetic diagnosis.

15

Medically relevant tandem repeats in nanopore sequencing of control cohorts

De Coster, W.; Hoijer, I.; Bruggeman, I.; D'Hert, S.; Melin, M.; Ameur, A.; Rademakers, R.

2024-03-14 genetic and genomic medicine 10.1101/2024.03.06.24303700 medRxiv

Top 0.1%

22.2%

Show abstract

Research and diagnostics for medically relevant tandem repeats and repeat expansions are hampered by the lack of population-scale databases. We attempt to fill this gap using our pathSTR web tool, which leverages long-read sequencing of large cohorts to determine repeat length and sequence composition in the general population. The current version includes 878 individuals of the 1000 Genomes Project cohort sequenced on the Oxford Nanopore Technologies PromethION. A comprehensive set of medically relevant tandem repeats were genotyped using STRdust to determine the tandem repeat length and sequence composition. PathSTR provides rich visualizations of this dataset, as well as the feature to upload ones own data for comparison along the control cohort. We demonstrate the implementation of this application using data from targeted nanopore sequencing of a patient with Myotonic Dystrophy type 1. This resource will empower the genetics community to get a more complete overview of normal variation in tandem repeat length and sequence composition, and enable a better assessment of the pathogenic impact of tandem repeats observed in patients. PathSTR is available at https://pathstr.bioinf.be

16

A single heterozygous mutation in COG4 disrupts zebrafish early development via Wnt signaling

Xia, Z.-J.; Zeng, X.-X. I.; Tambe, M.; Ng, B. G.; Dong, D. S.; Freeze, H. H.

2021-05-23 developmental biology 10.1101/2021.05.23.443307 medRxiv

Top 0.1%

22.2%

Show abstract

Saul-Wilson syndrome (SWS) is a rare, skeletal dysplasia with progeroid appearance and primordial dwarfism. It is caused by a heterozygous, dominant variant (p.G516R) in COG4, a subunit of the Conserved Oligomeric Golgi (COG) complex involved in intracellular vesicular transport. Our previous work has shown the intracellular disturbances caused by this mutation; however, the pathological mechanism of SWS needs further investigation. We sought to understand the molecular mechanism of specific aspects of the SWS phenotype by analyzing SWS-derived fibroblasts and zebrafish embryos expressing this dominant variant. SWS fibroblasts accumulate glypicans, a group of heparan sulfate proteoglycans (HSPGs) critical for growth and bone development through multiple signaling pathways. Consistently, we find that glypicans are increased in embryos expressing the COG4p.G516R variant. These animals show phenotypes consistent with convergent extension (CE) defects during gastrulation, shortened body length, and malformed jaw cartilage chondrocyte intercalation at larval stages. Since non-canonical Wnt signaling was shown in zebrafish to be related to the regulation of these processes by Glypican 4, we assessed wnt levels and found a selective increase of wnt4 transcripts in the presence of COG4p.G516R. Moreover, overexpression of wnt4 mRNA phenocopies these developmental defects. LGK974, an inhibitor of Wnt signaling corrects the shortened body length at low concentrations but amplifies it at slightly higher concentrations. WNT4 and the non-canonical Wnt signaling component phospho-JNK are also elevated in cultured SWS-derived fibroblasts. Similar results from SWS cell lines and zebrafish point to altered non-canonical Wnt signaling as one possible mechanism underlying SWS pathology.

17

Using computational approaches to enhance the interpretation of missense variants in the PAX6 gene

Andhika, N. S.; Biswas, S.; Hardcastle, C.; Green, D.; Ramsden, S.; Birney, E. J.; Black, G. C.; Sergouniotis, P.

2023-12-26 genetic and genomic medicine 10.1101/2023.12.21.23300370 medRxiv

Top 0.1%

22.1%

Show abstract

PurposeThe PAX6 gene encodes a highly-conserved transcription factor involved in eye development. Heterozygous loss-of-function variants in PAX6 can cause a range of ophthalmic disorders including aniridia. A key molecular diagnostic challenge is that many PAX6 missense changes are presently classified as variants of uncertain significance. While computational tools can be used to assess the effect of genetic alterations, the accuracy of their predictions varies. Here, we evaluated and optimised the performance of computational prediction tools in relation to PAX6 missense variants. MethodsThrough inspection of publicly available resources (including HGMD, ClinVar, LOVD and gnomAD), we identified 241 PAX6 missense variants that were used for model training and evaluation. The performance of ten commonly-used computational tools was assessed and a threshold optimization approach was utilized to determine optimal cut-off values. Validation studies were subsequently undertaken using PAX6 variants from a local database. ResultsAlphaMissense, SIFT4G and REVEL emerged as the best-performing predictors; the optimized thresholds of these tools were 0.967, 0.025, and 0.772, respectively. Combining the prediction from these top-three tools resulted in lower performance compared to using AlphaMissense alone. ConclusionTailoring the use of computational tools by employing optimized thresholds specific to PAX6 can enhance algorithmic performance. Our findings have implications for PAX6 variant interpretation in clinical settings.

18

Toward Automatic Variant Interpretation: Discordant Genetic Interpretation Across Variant Annotations for ClinVar Pathogenic Variants

Chen, A. Y.-A.; Yuan, T.-H.; Huang, J.-H.; Wang, Y.-B.; Hung, T.-M.; Chen, C.-Y.; Hsu, J. S.; Chen, P.-L.

2024-10-15 genomics 10.1101/2024.10.11.617756 medRxiv

Top 0.1%

19.6%

Show abstract

PurposeHigh-throughput sequencing has revolutionized genetic disorder diagnosis, but variant pathogenicity interpretation is still challenging. Even though the Human Genome Variation Society (HGVS) provides recommendations for variant nomenclature, discrepancies in annotation remain a significant hurdle. MethodsThis study evaluated the annotation concordance between three tools-- ANNOVAR, SnpEff, and Variant Effect Predictor (VEP)--using 164,549 two-star variants from ClinVar. The analysis used HGVS nomenclature string-match comparisons to assess annotation consistency from each tool, corresponding coding impacts, and associated ACMG criteria inferred from the annotations. ResultsThe analysis revealed variable concordance rates, with 58.52% agreement for HGVSc, 84.04% for HGVSp, and 85.58% for the coding impact. SnpEff showed the highest match for HGVSc (0.988), while VEP bettered for HGVSp (0.977). The substantial discrepancies were noted in the Loss-of-Function (LoF) category. Incorrect PVS1 interpretations affected the final pathogenicity and downgraded PLP variants (ANNOVAR 55.9%, SnpEff 66.5%, VEP 67.3%), risking false negatives of clinically relevant variants in reports. ConclusionsThese findings highlight the critical challenges in accurately interpreting variant pathogenicity due to discrepancies in annotations. To enhance the reliability of genetic variant interpretation in clinical practice, standardizing transcript sets and systematically cross-validating results across multiple annotation tools is essential. Graphic abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=120 SRC="FIGDIR/small/617756v1_ufig1.gif" ALT="Figure 1"> View larger version (24K): org.highwire.dtl.DTLVardef@9f6378org.highwire.dtl.DTLVardef@3b9a2corg.highwire.dtl.DTLVardef@106fb58org.highwire.dtl.DTLVardef@15f70c8_HPS_FORMAT_FIGEXP M_FIG This study examined the consistency of variant annotations produced by three widely used open-source toolsANNOVAR, SnpEff, and VEPagainst 164,549 ClinVar two starts variants. The investigation covers HGVS-based transcript, protein nomenclature and coding impact annotation. The results showed that none of the tools were fully consistent with ClinVar across all coding impact categories, particularly in the LoF category, which exhibited the poorest consistency. This inconsistency may lead to discrepancies in PVS1 interpretation, affecting the final pathogenicity assessment. PVS1 loss resulted in a significant downgrading of PLP variants, potentially leading to the omission of clinically relevant variants in reports. C_FIG

19

One-Sided Matching Portal (OSMP): a tool to facilitate rare disease patient matchmaking

Osmond, M.; Price, E. M.; Buske, O. J.; Frew, M.; Couse, M.; Hartley, T.; Klamann, C.; Le, H. G. B. H.; Xu, J.; So, D.; Jain, A.; Lu, K.; Mo, K.; Wyllie, H.; Wall, E.; Driver, H. G.; Cheung, W.; Cohen, A. S. A.; Farrow, E. G.; Thiffault, I.; Care4Rare Canada Consortium, ; Turinsky, A.; Pastinen, T.; Brudno, M.; Boycott, K. M.

2024-09-04 genetic and genomic medicine 10.1101/2024.09.03.24313012 medRxiv

Top 0.1%

18.9%

Show abstract

BackgroundGenomic matchmaking - the process of identifying multiple individuals with overlapping phenotypes and rare variants in the same gene - is an important tool facilitating gene discoveries for unsolved rare genetic disease (RGD) patients. Current approaches are two-sided, meaning both patients being matched must have the same candidate gene flagged. This limits the number of unsolved RGD patients eligible for matchmaking. A one-sided approach to matchmaking, in which a gene of interest is queried directly in the genome-wide sequencing data of RGD patients, would make matchmaking possible for previously undiscoverable individuals. However, platforms and workflows for this approach have not been well established. ResultsWe released a beta version of the One-Sided Matching Portal (OSMP), a platform capable of performing one-sided matchmaking queries across thousands of participants stored in genomic databases. The OSMP returns variant-level and participant-level information on each variant occurrence (VO) identified in a queried gene and displays this information through a customizable data table. A workflow for one-sided matchmaking was developed so that researchers could effectively prioritize the many VOs returned from a given query. This workflow was then tested through pilot studies where two sets of genes were queried in over 2,500 individuals: 130 genes that were newly associated with disease in OMIM, and 178 candidate genes that were not yet associated with a described disease-gene association in OMIM. These pilots both returned a large number of initial VOs (12,872 and 20,308, respectively), however the workflow successfully filtered out over 99.8% of these VOs before they were sent for review by a patients clinician. Filters on participant-level information, such as variant zygosity, participant phenotype, and whether a variant was also present in unaffected participants were especially effective in this workflow at reducing the number of false positive matches. ConclusionsAs demonstrated through the two pilot studies, one-sided matchmaking queries can be efficiently performed using the OSMP. The availability of variant-level and participant-level data is key to ensuring this approach is practical for researchers. In the future, the OSMP will be connected to additional RD databases to increase the accessibility of matchmaking to unsolved RGD patients.

20

Rethinking the pathogenicity of intragenic DMD duplications detected by carrier screening: high prevalence of non-tandem duplications revealed by long-read sequencing

Ding, Q.; Balan, J.; Vidal-Folch, N.; Pickart, A. M.; Sun, G.; Walsh, J. R.; Majumdar, R.; Klee, E. W.; Murphy, S. J.; Oglesbee, D.; Rowsey, R. A.; Hasadsri, L.

2025-04-14 genetic and genomic medicine 10.1101/2025.04.10.25325596 medRxiv

Top 0.1%

18.8%

Show abstract

PurposeThe pathogenicity of intragenic duplications depends on their structural configuration. Tandem duplications often disrupt reading frames and cause gene loss-of-function, whereas interspersed (non-tandem) duplications are largely benign. When the configuration cannot be determined, current guidelines presume a tandem structure, leading to some laboratories automatically classifying such variants as likely pathogenic or pathogenic. This study evaluates the validity of this presumption for DMD, in patients with and without clinical indications of dystrophinopathy. MethodsWe performed high-coverage whole-genome long-read sequencing on 15 patients with intragenic DMD duplications. Four patients had clinically indicated dystrophinopathy testing, while in the remaining 11 patients, the duplications were detected without clear indications of dystrophinopathy (e.g., "incidentally detected" through carrier screening). ResultsAll four patients with clinical indications had tandem duplications. In contrast, 64% (7/11) of the incidentally detected cases had interspersed duplications, with four subsequently re-classified as likely benign, two likely pathogenic, and one uncertain. These duplications were often complex, involving co-duplications or co-deletions with other regions. ConclusionOur findings challenge the presumption that intragenic DMD duplications are predominantly tandem. This highlights the need for a cautious variant interpretation approach, particularly in carrier screening and other settings where variants are identified without indications of dystrophinopathy.